Clustering for Data Reduction: A Divide and Conquer Approach
نویسندگان
چکیده
We consider the problem of reducing a potentially very large dataset to a subset of representative prototypes. Rather than searching over the entire space of prototypes, we first roughly divide the data into balanced clusters using bisecting k-means and spectral cuts, and then find the prototypes for each cluster by affinity propagation. We apply our algorithm to text data, where we perform an order of magnitude faster than simply looking for prototypes on the entire dataset. Furthermore, our “divide and conquer” approach actually performs more accurately on datasets which are well bisected, as the greedy decisions of affinity propagation are confined to classes of already similar items.
منابع مشابه
Clustering in WSN Based on Minimum Spanning Tree Using Divide and Conquer Approach
Due to heavy energy constraints in WSNs clustering is an efficient way to manage the energy in sensors. There are many methods already proposed in the area of clustering and research is still going on to make clustering more energy efficient. In our paper we are proposing a minimum spanning tree based clustering using divide and conquer approach. The MST based clustering was first proposed in 1...
متن کاملKnowledge Reduction Based on Divide and Conquer Method in Rough Set Theory
The divide and conquer method is a typical granular computing method using multiple levels of abstraction and granulations. So far, although some achievements based on divided and conquer method in the rough set theory have been acquired, the systematic methods for knowledge reduction based on divide and conquer method are still absent. In this paper, the knowledge reduction approaches based on...
متن کاملFree Vibration Analysis of Repetitive Structures using Decomposition, and Divide-Conquer Methods
This paper consists of three sections. In the first section an efficient method is used for decomposition of the canonical matrices associated with repetitive structures. to this end, cylindrical coordinate system, as well as a special numbering scheme were employed. In the second section, divide and conquer method have been used for eigensolution of these structures, where the matrices are in ...
متن کاملA Novel K means Clustering Algorithm for Large Datasets Based on Divide and Conquer Technique
In this paper we propose an efficient algorithm that is based on divide and conquers technique for clustering the large datasets. In our research work we have applied divide and conquer technique on partitions of the large datasets and we have used squared Euclidean distance for measuring the similarity between data points. The partitioning of datasets is done according to the number of cluster...
متن کاملSpeaker diarization using divide-and-conquer
Speaker diarization systems usually consist of two core components: speaker segmentation and speaker clustering. The current state-of-the-art speaker diarization systems usually apply hierarchical agglomerative clustering (HAC) for speaker clustering after segmentation. However, HAC’s quadratic computational complexity with respect to the number of data samples inevitably limits its application...
متن کامل